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Quantum circuits which perforin integer arithmetic could potentially outperform their 
classical counterparts. In this paper, a quantum circuit is considered which performs 
a specific computational pattern on classically represented integers to accelerate the 
computation. Such a hybrid circuit could be embedded in a conventional computer 
architecture as a quantum device or accelerator. In particular, a quantum multiply- 
add circuit (QMAC) using a Quantum Fourier Transform (QFT) is proposed which can 

^— H perform the calculation on conventional integers faster than its conventional counterpart. 

^ Whereas classically applying a multiply-adder (MAC) n times to k bit integers would 

^y\ require 0(n log fc) parallel steps, the hybrid QMAC needs only 0{n + k) steps for the 

VP exact result and 0{n + logfc) steps for an approximate result. 
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_ 1 Introduction 

J> Quantum computing has the potential to dramaticahy change the nature of computing, but 

*^^ has mostly been a theoretical subject partly due to the difficulties in building physical quan- 

*V^ tum circuits. However, recent progress has enabled the first, albeit small, quantum devices to 

C^ be constructed, see for example [T] utilising photonics. These devices are not complete quan- 

tum computers, but consist of simple quantum circuits capable of processing information to 
solve specific problems. Critically, these devices can be fabricated in silicon which could lead 
to their integration with conventional microelectronics. How would such a hybrid of conven- 
tional and quantum microprocessor be used? Co-processor architectures have been developed 
in the past but perhaps the most promising context would be to consider the quantum device 
as an accelerator. 

There are several examples of modern heterogeneous computer architectures. For exam- 
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pie, Graphical Processing Units (GPUs) have been used extensively in the field of scientific 
numerical computing to accelerate specific aspects of these calculations, where some suitably 
defined compute kernel is offloaded from the CPU and executed faster on the GPU. Another 
analogy can be drawn with field programmable gate arrays (FPGAs) where particular com- 
putational patterns in software can be instantiated in hardware using the reprogrammable 
logic of these devices, see for example [31 13]. Rather than accelerating an entire kernel as 
would be required for a GPU, a quantum device could be employed to accelerate a specific 
computational pattern. Moreover, as this device would function as an accelerator, a complete 
quantum computer would not be required. Furthermore, the effects of quantum decoherence 
which destroys quantum information can be mitigated because such quantum circuits need 
only to be in an entangled state for brief period. 

The addition and multiplication of small integers are the simplest computational patterns. 
Here, the manipulation of n classically represented integers of size k bits by a quantum 
circuit is considered. A key consideration is the number of parallel steps it takes to execute a 
quantum circuit, i.e. the depth of the quantum circuit implementing the computation. The 
first quantum addition circuit was proposed by Vedral et. al. in 1995 [4i. ft is a quantum 
version of the classical ripple carry addeJt' The quantum ripple carry adder has been further 
studied in the circuit model of quantum computing [H [3 HI IH] and in the Measurement Based 
Quantum Computing Model (MBQC) (TU]. The quantum circuits implementing the classical 
carry- lookahead addeJJiave been investigated in [TT1|H1[T2] and the MBQC design in [T^. 

Most of the quantum adders constructed are thus quantum versions of classical ripple carry 
or carry-lookahead adders. A notable exception is the addition circuit proposed by Draper in 
|14) . which utilises the quantum Fourier transform (QFT) operation. Whilst this particular 
circuit performs no better than a classical carry-lookahead circuit, employing circuit features 
which are specific to quantum circuits rather than quantum analogues of classical circuits 
may allow performance gains to be achieved. 

Quantum arithmetic circuits for integer multiplication have been proposed in [151 I16j , but 
this is the first work studying the multiply-add operation in the quantum setting. Although 
quantum arithmetic logic units (ALUs) have been proposed in several papers [T71 [THl [THl ^U\ , 
none of them analyse if the addition and multiplication could be merged into a single, more 
efficient multiply-add operation. 

In this work, the QFT, highly entangled quantum states obtained through "fanning- 
out" [2T1 of the QFT states, and the classical properties of a hybrid circuit are combined 
together to produce a QFT multiply-add circuit (QMAC) for classical integers which outper- 
forms a conventional multiply-add unit. 

The rest of the paper is organised as follows: In Section [2] the QMAC is described. In 
Section [3] the depth of the circuit is analysed and compared to a conventional multiply-add 
circuit. Finally, in Section [4] the results are presented. 

2 The QFT Multiply-Add Circuit 

Consider a unitary operator, M , which when combined with the QFT can be used to compute 
the action of a classical integer MAC: z + y ■ x, where z,y,x G Z. This operator is then 
decomposed into single qubit gates. The decomposition shown is particularly useful, since it 
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allows for the construction of a parallel hybrid circuit which is presented at the end of this 
section. For the sake of notational simplicity, only unsigned integers are considered but the 
presented circuits can work with signed integers if the two's complement representation is 
used. 

Let Zfe • • • Z2Z1 be the binary representation of a fc bit integer z such that z — Z}J2^^^ + 

h Z22^ + zi2^ and 0.2^ • • • 22^1 the binary fraction Zk/i^ H h 22/2''"^ + zi/2^. Then the 

QFT of a fc qubit computational basis state \z) can be written as .22] : 

Q^r|2)=i=(|0)+e2-o-Ml)) 

v2 

®^(|0)+e2-^0-^'=-^=^i|l)). (1) 

v2 

Let Mj{y,x) be a single qubit unitary operator defined as follows: 

M,{y,x)\Q)^\Q) (2) 

M,(y,a;)|l)^e2-o*-^--|l), (3) 

where x, y € Z are fc bit integers. The effect of applying Mj{y,x) to a state which has the 
first j bits of a fc-bit integer z encoded in its relative phase is 

Mj{y,x)^{\0) + e^'^'^-'^-'^'^l)) = ^(|o) + e2^*(°-^^-^^^i+°-»-^^2^i-^)|l)) (4) 

V 2 V 2 

The above equation shows that the action of Mj{y, x) is similar to applying a MAC operator 
to the binary fraction encoded in the relative phase, i.e. it multiplies the binary fraction 
O.yj ■ ■ ■ 2/22/1 with x and adds it to O.Zj ■ ■ ■ Z2Z1. The fc qubit quantum operator M correspond- 
ing to a MAC is defined as 

M(y, x) = Mi{y, x) (g) A'hiy, x) ® ■ ■ ■ ® Mk{y, x). (5) 

The application of M{y,x) to QFT\z) will result in the state 

M{y,x)QFT\z) =^(|0) + e2"*(°-"i+°-«i-")|l)) 
v2 

v2 

^J-(^\0)+e^'''^°-"'-'-^'-'+°-y''-y^y'-''^\l)) ^QFT\z + yx). (6) 

v2 

Applying the QFT^ operator and measuring the result in the computational basis gives the 
output z + y ■ X, which would also be the effect of a classical MAC applied to x, y, 2. Note 
that since e277*(m+o.2:r-^2^i) = ^2^10. zr■■z2z^ fQj. gygj.y meZ, ze {0, 1}'', and / e {1, 2, • • • , fc} 

the output is computed modulo fc. 
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Any realistic quantum device would have to be built using quantum gates acting on a 
limited number of qubits, thus the M{y, x) operator needs to be decomposed into one- and 
two-qubit quantum gates. To obtain a performance that surpasses classical MACs the M{y, x) 
operation will be constructed in a way that allows every gate in its circuit to be applied in 
one simultaneous step. The following gates are used in the circuit construction: 
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(7) 



where Rj is a phase shift gate and CNOT is the controlled NOT gate. Note that the operator 
Rj has the following properties: 

(8) 

The j-qubit fan-out operator Fj which maps |a)|6i) • • • \hj-i) — > \a)\bi ® a) ■ ■ ■ \hj-i © a), 
where 6^ ® a = (6,; ® a) mod 2 is also required. It is trivial to see that F^ = F. The operator 
Qjiy) — Rjyi ' ' ' R2yj-iRiyj is used as a sub-circuit in the M{y,x) construction. The effect 
of Qj{y) on the one qubit computational basis is: 
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Note that since y^, where m £ {1,2,--- ,j}, is a binary value and Rq = R'^ = I for every 
I e Z, the operator Qj{y) can be written as follows: 



Q,(y) = i?f •••i?r^^i?f. 
Furthermore, from the equalities [8] and |9] it follows that: 
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The above equation implies that Qj-m 
be written as: 



I ii j — m < 1, therefore the Mj{y, x) operator can 



Mj(y,a;)|l)=e2"-°-^'--^i-"|l) 



^2T:i-0.yj---yi-(xi-2^'-i-X2-2^A ha:fc-2'=~i) 



|1) 



Ay) 



Xk-2'' 



■QAvr-^^Q, 



(yr 



xi-2" 



Qj-k+iiyr^-'-Qj-iiyr^Qjiyni) 
Qi{yr---Q,^iiy)'^'Q,iyr'\^) 



M,{y,x)\0)=Qi{yr^---Q,-,{yr'Q,iynO) = \0)- 



(14) 
(15) 



C M. Maynard and E. Pius 5 



The decomposition of Mj{y,x) into Qj{y) (Eq. 



14 



and 15) operators and Qj{y) into Rj 
operators (Eq. 12 1 will be used to construct a parallel quantum circuit for M{y,x). Note 
that the descriptions of M{y,x), Mj{y,x), and Qj{y) contain the arguments x and y. This is 
undesired for practical implementations of a circuit, since a circuit cannot in general change 
depending on the input. In the design below, this problem is resolved by using the bits of the 
arguments as controls for quantum gates, i.e. the value of classical bits is used to determine 
if a particular quantum gate should be applied or not. First, the parallel hybrid circuit for 



12 



Qjiy) is constructed. Since R^ = I and Rj = Rj, the effect of a input bit ym in Eq 
where m e {1, 2, • • • , j}, is to control the application of the gate Rj+i-m- Thus the quantum 
circuit of Qj{y) can be constructed using only single qubit Rj gates controlled by classical 
bits ym ■ AH of the Rj gates in Qj (y) can be applied in parallel using auxiliary qubits and the 
Fj gate [H]. Thus the parallel hybrid circuit FjQjFj of Qj{y) can be constructed as shown 
in Figure [Tl 
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Fig. 1. The parallel version of the operator Qj{y). The Fj blocks can be applied in 0{logj) steps 
|21| . \ip) is an arbitary 1 qubit state. The dashed lines represent classical bits and the continious 
lines qubits A single line crossing a wire represents a single bit/qubit and three lines crossing a 
wire represent multiple bits/qubits. 

Since Qj{yY' — Qj{0) — I and Xi is a binary value, AIj can be written as Mj{y,x) = 
Qi{y ■ Xj) ■ ■ ■ Qj-i{y ■ X2)Qj{y ■ xi). The values of both y and x are classical bit-strings, hence 
the operation y ■ Xi can be performed classically in one parallel computational step using an 
AND operator between Xi and every bit of y. Since Mj(y, x) can be decomposed into diagonal 
operators Qj{y), there exists a parallel hybrid circuit where all the Oj{y) operators are applied 
simultaneously |;2l . In this circuit's construction the parallel hybrid circuit Oj is used as seen 
in Figure[2J Since M{y, x) is a tensor product of the operators Mj{y, x), where j € {!,- ■ ■ k}, 
the circuit of M{y,x) can be created by simply applying an appropriate Mj sub-circuit to 
each of the input qubits as shown in Figure |3] The circuit FMF in the aforementioned figure 
corresponds to the operator M{y, x) and together with the QFT comprises the quantum MAC 
circuit. 



3 Analysis of the Circuit 

The main result of this work concerns the depth of the hybrid MAC circuit in the case of 
sequential application. When the circuit FMF in Figure [3] is applied in repeatedly, then 
the only F gates having a non-trivial effect will be at the beginning and the end of the 
computation. This is due to the fact that FF = FF^ = / and thus {FMF){FMF) = 
FMMF. Combining the circuit in Figure [3] with the QFT and using it to perform the 
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Fig. 2. The parallel hybrid circuit of the Mj{y,x) operator. 
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Fig. 3. The parallel hybrid circuit of the M{y, x) operator. 

multiply-addition operation of n integers results in the hybrid circuit depicted in Figure |4] 
As can be seen form the figure, the overall depth will depend on the depth of M, which 
according to the next lemma is constant. 

Lemma 1 The depth of the hybrid circuit M is 2. 

Proof. It can be seen from figure |3] that the depth of the M circuit has to be equal to 
the maximum depth of any Mj sub-circuits, where j G {1, • ■ ■ , fc}. It is apparent that by 
substituting the Qj circuits in Mj , shown in Figure l2| with the one described in Figure fll a 
circuit with one layer of classical AND gates and one layer of single qubit Rm gates can be 
constructed. Thus the combined depth of any M.j and hence M, circuit is 2 D. 

When determining the depth of a circuit, gates of variable size, such as the F gate have 
to be decomposed into one- and two-qubit quantum gates. An F^^ gate can be written as an 
0(log?7i) depth circuit consisting of only CNOT gates, where m is the number of qubits Fm 
acts upon. From Figure [3] it can be seen that the number of qubits F acts upon, is equal to 
the number of qubits M acts upon. This in turn is equal to the number of quantum gates in 
M since by Lemma [T] there is only one layer of quantum gates. Thus the depth of the circuit 
in|4]it is given by the number of gates in M . 



Lemma 2 The size of the hybrid circuit M is 0(k^). 
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Fig. 4. The hybrid quantum circuit computing the MAC operation n times in a sequence with 
multiplicand pairs (x\, y\\ ■ ■ ■ , {x„, jy„), where z, Xi,yi £ Z. Here z' = z + ^"—i ^i ■ yi- 



Proof. Let size{C) be the size of a quantum circuit C, i.e. the number of one- and two- 
qubit quantum gates in the decomposition of C. Every Mj sub-circuit in M corresponds to 
one Mj{y,x) operator in the definition of M{y,x). Furthermore, every Qi sub-circuit in Mj 

and [Tsj) and each Rm 
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corresponds to a Qi{y) operator in the definition of Mj{y,x) (Eq. 

gate in Qi corresponds to a Rm operator in the definition of Qi{y) (Eq. |12[ ). It can be seen 

from Eq. 12 that size{Qi) — I and the size of the circuit M is therefore 
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D. 

Now the overall depth of a hybrid circuit performing n MAC operations on k bit integers 
can be estimated. This is done for both the exact and approximate case. 



Theorem 1 There exists a hybrid quantum circuit with depth in 0{n + k) which performs 
n multiply additions of k bit integers exactly using 0{k^) qubits. 

Proof. The quantum circuit used to perform n multiply additions is shown in Figure |4] 
The number of qubits used by the circuit is equal to the number of qubits the M operator 
acts upon, which by Lemma ^ is 0{k^). 

The QFT and QFT^ of k qubits can be applied in 0(k) depth [53]. The fan-out operations 
F can be constructed using a tree- like structure so that the depth of that circuit is logarithmic 
in the number of qubits to which they are applied pT|. i.e. 0(log k^) = 0(log k). The M circuit 
can be performed in exactly two steps as proven in Lemma [T] and it is applied it exactly n 
times. Thus the overall depth, i.e. the number of parallel steps required for the application 
of the circuit, is n • 0(1) + 2 ■ 0(log k) + 2 ■ 0{k) = 0{n + k) D. 

In practice it is unlikely that any quantum gates, or indeed, classical logic gates, could be 
implemented perfectly. That is, there will always be a small probability of the implemented 
gate failing, resulting in a wrong answer. However, it is sufficient to obtain the correct 
result with high enough probability. When an exact result is not required, the depth of a 
hybrid circuit computing multiple MAC operations can be even smaller. A unitary operator 
is approximated with precision e if for any pure input quantum state the Euclidian distance 
between the desired unitary U and the implemented unitary V is at most e. 
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Theorem 2 There exists a hybrid quantum circuit with depth 0{n + logfc + loglogl/e) 
which performs n multiply additions of k hit integers with precision e using 0{k^ + k-\og{k/e)) 
qubits. 

Proof. To obtain a better depth than in the exact case a shghtly modified version of the 
circuit in Figure [4] is used in the approximate case. The initial QFT can be replaced with a 
single layer of Hadamard gates applied to the k qubit state |0). Note that this is equivalent 
to applying QFT to |0). Next the M circuit is used to add z • 1 to QFT|0), which gives us 
the state QFT\Q-\-z- 1) = QFT\z). This is the same result as would be obtained by the exact 
circuit, but can be done in constant depth. 

An approximate version of QFT\ introduced in |23) . can be used as the final step. This 
QFT^ has depth 0(logfc + loglogl/e) and size 0{k ■ log(fc/e)) with precision e. The depth 
of the fan-out and M operations are discussed above. The M is applied exactly n + 1 times. 
Thus the overall depth is (n + 1) • 0(1) + 0(1) + 2 • O(logfc) + 0(logfc + loglogl/e) = 
0{n + log A; + loglogl/e). 

The number of qubits used by the circuit is equal to the maximum number of qubits the 
M operator acts upon, which by Lemma is 0{k'^), and the number of qubits QFT^ acts 
upon. Thus the total number of qubits acted upon is 0{k^ + k ■ log(fc/e)) D. 

4 Results and Discussion 

The depth of the proposed circuit for adding n integers of k bits is 0{n + k) for the exact 
circuit and 0{n + log A:) for the approximate circuit. The classical implementations of MAC 
are limited by the depth complexity of the multiplication operations. This is true even for the 
lowest depth multiplication circuits such as Wallace [5J and Dadda |25] multipliers which are 
used in most CPU architectures and have a depth of 0(log fc). Thus the sequential application 
of n classical MACs requires at least 0(n logfc) parallel steps. It is unlikely that gate delays 
in classical and quantum circuits will be the same. Indeed, they vary for different classical 
circuits. However, in this analysis of the different circuits the simple counting of the number 
of gates is used. It is worth nothing that the advantage in depth gained by using a QMAC 
increases with the number of sequential applications and the size of integers used. Thus 
independent of the gate delays, there will be for every integer size an n such that performing 
at least n MAC operations has less depth when using the hybrid QMAC circuit than a classical 
one. 

The small depth of the QMAC is a consequence of using the QFT, a highly entangled 
quantum state and classical fan-out, that is, copying of bits. First, since the MAC operation 
is performed on the QFT state, only diagonal gates are necessary. This makes it possible to 
entangle the quantum register with auxiliary qubits in a way that allows the simultaneous 
application of every single-qubit quantum gate. Second, the states of a bit can be copied by 
using multiple output wires to more than two registers for the next computational step. Thus 
the information propagates in one step to all the quantum gates controlled by these bits. This 
can be interpreted as influencing the state of an unbounded number of qubits with just one 
fan-out operation. 
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The hybrid QMAC circuit implements a very specific computational pattern, the MAC. 
This makes it suitable for use as an execution unit in a hybrid CPU or even a separate 
accelerator device such as a FPGA or GPU in future computers. Moreover, the fact that 
implementing this circuit does not require a full quantum computer makes it more likely to 
be realisable in the near future. The small depth of the circuit contributes to the ease of 
implementation, since the time needed to keep the quantum state coherent depends on the 
circuit depth. A further consequence of the hybrid nature of the circuit is that the number of 
qubits and two-qubit gates used is relatively small. Instead of using only quantum registers, 
two of the three registers in QMAC circuit are classical. Using classically controlled single 
qubit gates instead of fully quantum controlled gates limits the number of two-qubit gates 
used. However, the entangled state used requires 0{k^) auxiliary qubits and two-qubit gates. 

Future work would be to consider how to adapt the hybrid QMAC circuit floating to point 
operations, which are used in most of the time-intensive computations. This would greatly in- 
crease the number of problems which would benefit from quantum devices. Another direction 
would be to consider hybrid circuits for other arithmetic operations for example division and 
look at how the different circuits can be combined together. The QMAC introduced in this 
paper has a lower depth than a classical MAC only if it is applied in a sequence. Hence com- 
bining different quantum arithmetic operators could result in an improved depth compared 
with classical circuits. 
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